区分乳腺癌的内在亚型对于决定最佳治疗策略至关重要。深度学习可以比常规统计方法更准确地从遗传信息中预测亚型,但是迄今为止,尚未直接利用深度学习来检查哪些基因与哪些亚型相关。为了阐明嵌入在内在亚型中的机制,我们开发了一个可解释的深度学习模型,称为点线性(PWL)模型,该模型为每个患者生成定制的逻辑回归。逻辑回归是医生和医学信息学研究人员都熟悉的,使我们能够分析特征变量的重要性,而PWL模型则利用了逻辑回归的这些实际能力。在这项研究中,我们表明分析乳腺癌亚型对患者有益,也是验证PWL模型能力的最佳方法之一。首先,我们使用RNA-Seq数据训练了PWL模型,以预测PAM50固有的亚型,并通过亚型预测任务将其应用于PAM50的41/50基因。其次,我们开发了一种深度富集分析方法,以揭示PAM50亚型与乳腺癌的拷贝数之间的关系。我们的发现表明,PWL模型利用与细胞周期相关途径相关的基因。这些在乳腺癌亚型分析中取得的初步成功证明了我们分析策略的潜力,以阐明乳腺癌的基础机制并改善整体临床结果。
translated by 谷歌翻译
Learning-from-Observation (LfO) is a robot teaching framework for programming operations through few-shots human demonstration. While most previous LfO systems run with visual demonstration, recent research on robot teaching has shown the effectiveness of verbal instruction in making recognition robust and teaching interactive. To the best of our knowledge, however, few solutions have been proposed for LfO that utilizes verbal instruction, namely multimodal LfO. This paper aims to propose a practical pipeline for multimodal LfO. For input, an user temporally stops hand movements to match the granularity of human instructions with the granularity of robot execution. The pipeline recognizes tasks based on step-by-step verbal instructions accompanied by demonstrations. In addition, the recognition is made robust through interactions with the user. We test the pipeline on a real robot and show that the user can successfully teach multiple operations from multimodal demonstrations. The results suggest the utility of the proposed pipeline for multimodal LfO.
translated by 谷歌翻译
Robot developers develop various types of robots for satisfying users' various demands. Users' demands are related to their backgrounds and robots suitable for users may vary. If a certain developer would offer a robot that is different from the usual to a user, the robot-specific software has to be changed. On the other hand, robot-software developers would like to reuse their developed software as much as possible to reduce their efforts. We propose the system design considering hardware-level reusability. For this purpose, we begin with the learning-from-observation framework. This framework represents a target task in robot-agnostic representation, and thus the represented task description can be shared with various robots. When executing the task, it is necessary to convert the robot-agnostic description into commands of a target robot. To increase the reusability, first, we implement the skill library, robot motion primitives, only considering a robot hand and we regarded that a robot was just a carrier to move the hand on the target trajectory. The skill library is reusable if we would like to the same robot hand. Second, we employ the generic IK solver to quickly swap a robot. We verify the hardware-level reusability by applying two task descriptions to two different robots, Nextage and Fetch.
translated by 谷歌翻译
Assessing the critical view of safety in laparoscopic cholecystectomy requires accurate identification and localization of key anatomical structures, reasoning about their geometric relationships to one another, and determining the quality of their exposure. In this work, we propose to capture each of these aspects by modeling the surgical scene with a disentangled latent scene graph representation, which we can then process using a graph neural network. Unlike previous approaches using graph representations, we explicitly encode in our graphs semantic information such as object locations and shapes, class probabilities and visual features. We also incorporate an auxiliary image reconstruction objective to help train the latent graph representations. We demonstrate the value of these components through comprehensive ablation studies and achieve state-of-the-art results for critical view of safety prediction across multiple experimental settings.
translated by 谷歌翻译
We propose a light-weight and highly efficient Joint Detection and Tracking pipeline for the task of Multi-Object Tracking using a fully-transformer architecture. It is a modified version of TransTrack, which overcomes the computational bottleneck associated with its design, and at the same time, achieves state-of-the-art MOTA score of 73.20%. The model design is driven by a transformer based backbone instead of CNN, which is highly scalable with the input resolution. We also propose a drop-in replacement for Feed Forward Network of transformer encoder layer, by using Butterfly Transform Operation to perform channel fusion and depth-wise convolution to learn spatial context within the feature maps, otherwise missing within the attention maps of the transformer. As a result of our modifications, we reduce the overall model size of TransTrack by 58.73% and the complexity by 78.72%. Therefore, we expect our design to provide novel perspectives for architecture optimization in future research related to multi-object tracking.
translated by 谷歌翻译
本文从未分割的烹饪视频中解决了食谱生成,该任务要求代理(1)提取完成盘子时提取关键事件,以及(2)为提取的事件生成句子。我们的任务类似于密集的视频字幕(DVC),该字幕旨在彻底检测事件并为其生成句子。但是,与DVC不同,在食谱生成中,食谱故事意识至关重要,模型应以正确的顺序输出适当数量的关键事件。我们分析了DVC模型的输出,并观察到,尽管(1)几个事件可作为食谱故事采用,但(2)此类事件的生成句子并未基于视觉内容。基于此,我们假设我们可以通过从DVC模型的输出事件中选择Oracle事件并为其重新生成句子来获得正确的配方。为了实现这一目标,我们提出了一种基于变压器的新型训练事件选择器和句子生成器的联合方法,用于从DVC模型的输出中选择Oracle事件并分别为事件生成接地句子。此外,我们通过包括成分来生成更准确的配方来扩展模型。实验结果表明,所提出的方法优于最先进的DVC模型。我们还确认,通过以故事感知方式对食谱进行建模,提出的模型以正确的顺序输出适当数量的事件。
translated by 谷歌翻译
我们提出了一个名为“ Visual配方流”的新的多模式数据集,使我们能够学习每个烹饪动作的结果。数据集由对象状态变化和配方文本的工作流程组成。状态变化表示为图像对,而工作流则表示为食谱流图(R-FG)。图像对接地在R-FG中,该R-FG提供了交叉模式关系。使用我们的数据集,可以尝试从多模式常识推理和程序文本生成来尝试一系列应用程序。
translated by 谷歌翻译
游戏中的学习理论在AI社区中很突出,这是由多个不断上升的应用程序(例如多代理增强学习和生成对抗性网络)的动机。我们提出了突变驱动的乘法更新(M2WU),以在两人零和零正常形式游戏中学习平衡,并证明它在全面和嘈杂的信息反馈设置中都表现出了最后的题融合属性。在全信息反馈设置中,玩家观察了实用程序功能的确切梯度向量。另一方面,在嘈杂的信息反馈设置中,他们只能观察到嘈杂的梯度向量。现有的算法,包括众所周知的乘法权重更新(MWU)和乐观的MWU(OMWU)算法,未能收敛到具有嘈杂的信息反馈的NASH平衡。相反,在两个反馈设置中,M2WU表现出最后的近期收敛到NASH平衡附近的固定点。然后,我们证明它通过迭代地适应突变项来收敛到精确的NASH平衡。我们从经验上确认,M2WU在可剥削性和收敛速率方面胜过MWU和OMWU。
translated by 谷歌翻译
本文收集了提交给核心挑战2022的求解器和ISR实例的所有描述。
translated by 谷歌翻译
从2D前看声纳中检索声学图像中缺少的维度信息是水下机器人技术领域的一个众所周知的问题。有一些尝试从单个图像中检索3D信息的作品,该信息允许机器人通过飞行运动生成3D地图。但是,由于独特的图像配方原理,估计来自单个图像的3D信息面临严重的歧义问题。多视图立体声的经典方法可以避免歧义问题,但可能需要大量的观点来生成准确的模型。在这项工作中,我们提出了一种基于学习的新型多视角立体方法来估计3D信息。为了更好地利用来自多个帧的信息,提出了一种高程平面扫平方法来生成深度 - 齐路的成本量。正则化后的体积可以视为目标的概率体积表示。我们使用伪前深度来代表3D信息,而不是在高程角度上进行回归,而是可以避免声学成像中的2d-3d问题。只有两个或三个图像可以生成高准确的结果。生成合成数据集以模拟各种水下目标。我们还在大型水箱中构建了第一个具有准确地面真相的真实数据集。实验结果证明了与其他最新方法相比,我们方法的优势。
translated by 谷歌翻译